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Abstract 



Functional relationships between objects, called "attributes", are of considerable impor- 
tance in knowledge representation languages, including Description Logics (DLs). A study 
of the literature indicates that papers have made, often implicitly, different assumptions 
about the nature of attributes: whether they are always required to have a value, or whether 
they can be partial functions. The work presented here is the first explicit study of this 
difference for subclasses of the Classic DL, involving the same-as concept constructor. 
It is shown that although determining subsumption between concept descriptions has the 
same complexity (though requiring different algorithms), the story is different in the case 
of determining the least common subsumer (lcs). For attributes interpreted as partial 
functions, the lcs exists and can be computed relatively easily; even in this case our results 
correct and extend three previous papers about the lcs of DLs. In the case where attributes 
must have a value, the lcs may not exist, and even if it exists it may be of exponential size. 
Interestingly, it is possible to decide in polynomial time if the lcs exists. 

1. Introduction 

Knowledge representation systems based on Description Logics (DLs) have been the sub- 
ject of continued attention in Artificial Intelligence, both as a subject of theoretical studies 
(Borgida, 1994; Baader, 1996; Baader & Sattler, 2000; Giacomo & Lenzerini, 1996; Cal- 
vanese, Giacomo, & Lenzerini, 1999b) and in applications (Artale, Franconi, Guarino, & 
Pazzi, 1996; Brachman, McGuinness, Patel-Schneider, Sz Borgida, 1999; McGuinness h 
Patel-Schneider, 1998). More impressively, DLs have found applications in other areas in- 
volving information processing, such as databases (Borgida, 1995; Calvanese, Lenzerini, 
h Nardi, 1999), semi-structured data (Calvanese, Giacomo, &: Lenzerini, 1998, 1999a), 
information integration (Calvanese, Giacomo, Lenzerini, Nardi, &: Rosati, 1998; Borgida 
h Kiisters, 2000), as well as more general problems such as configuration (McGuinness 
&: Wright, 1998) and software engineering (Borgida &: Devanbu, 1999; Devanbu &: Jones, 
1997). In fact, wherever the ubiquitous term "ontology" is used these days (e.g., for pro- 
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viding the semantics of web/XML documents), DLs are prime contenders because of their 
clear semantics and well-studied computational properties. 

In Description Logics, one takes an object-centered view, where the world is modeled as 
individuals, connected by binary relationships (here called roles), and grouped into classes 
(called concepts). For those more familiar with Predicate Logic, objects correspond to 
constants, roles to binary predicates, and concepts to unary predicates. In every DL system, 
the concepts of the application domain are described by concept descriptions that are built 
from atomic concepts and roles using the "constructors" provided by the DL language. For 
example, consider a situation where we want a concept describing individual cars that have 
had frequent (at least 10) repairs, and also record the fact that for cars, their model is the 
same as their manufacturer's model. Concepts can be thought of as being built up from 
(possibly nested) simpler noun-phrases, so the above concept, called Lemon in the sequel, 
might be captured as the conjunction of 

(objects that are Cars) 

(things all of whose model values are in concept Model) 

(things all of whose madeBy values are in concept Manufacturer) 

(things whose model value is the same as the model of the madeBy attribute) 

(things with at least 10 repairs values) 

(things all of whose repairs values are RepairReport). 

Using the syntax of the CLASSIC language, we can abbreviate the above, while emphasizing 
the term-like nature of descriptions and the constructors used in each: 

(and Car 

(all model Model) 

(all madeBy Manufacturer) 

(same-as (model) (madeBy o model)) 

(at-least 10 repairs) 

(all repairs RepairReport)) 

So, for example, the concept term (at-least n p) has constructor at-least, and denotes 
objects which are related by the relationship p to at least n other objects; in turn, (all p 
C) has as instances exactly those objects which are related by p only to instances of C. 

Finally, we present the same concept in a mathematical notation which is more succinct 
and preferred in formal work on DLs: 

Lemon := Car n 

Vmodel. Model n 
VmadeBy. Manufacturer n 
madeBy | (model o madeBy) n 
> 10 repairs n 
Vrepairs. RepairReport 

Unlike preceding formalisms, such as semantic networks and frames (Quillian, 1968; Minsky, 
1975), DLs are equipped with a formal semantics, which can be given by a translation into 
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first-order predicate logic (Borgida, 1994), for example. Moreover, DL systems provide their 
users with various inference capabilities that allow them to deduce implicit knowledge from 
the explicitly represented knowledge. For instance, the subsumption algorithms allow one to 
determine subconcept-superconcept relationships: C is subsumed by D (C C D) if and only 
if all instances of C are also instances of D, i.e., the first description is always interpreted 
as a subset of the second description. For example, the concept Car obviously subsumes the 
concept description Lemon, while (at-least 10 repairs) is subsumed by (at-least 8 repairs). 

The traditional inference problems for DL systems, such as subsumption, inconsistency 
detection, membership checking, are by now well-investigated. Algorithms and detailed 
complexity results for realizing such inferences are available for a variety of DLs of differing 
expressive power — see, e.g., (Baader &: Sattler, 2000) for an overview. 

1.1 Least Common Subsumer 

The least common subsumer (les) of concepts is the most specific concept description sub- 
suming the given concepts. Finding the lcs was first introduced as a new inference problem 
for DLs by Cohen, Borgida, and Hirsh (1992). One motivation for considering the lcs is to 
use it as an alternative to disjunction. The idea is to replace disjunctions like C\ U ■ ■ ■ U C n 
by the lcs of C\, . . . , C n . Borgida and Etherington (1989) call this operation knowledge-base 
vivification. Although, in general, the lcs is not equivalent to the corresponding disjunction, 
it is the best approximation of the disjunctive concept within the available language. Using 
such an approximation is motivated by the fact that, in many cases, adding disjunction 
would increase the complexity of reasoning. 1 

As proposed by Baader et al. (Baader &: Kiisters, 1998; Baader, Kiisters, & Molitor, 
1999), the lcs operation can be used to support the "bottom-up" construction of DL knowl- 
edge bases, where, roughly speaking, starting from "typical" examples an lcs algorithm 
is used to compute a concept description that (i) contains all these examples, and (ii) is 
the most specific description satisfying property (i). Baader and Kiisters have presented 
such an algorithm for cyclic ACAf- concept descriptions; ACAf is a relatively simple lan- 
guage allowing for concept conjunction, primitive negation, value restrictions, and number 
restrictions. Also, Baader et al. (1999) have proposed an lcs algorithm for a DL allowing 
existential restrictions instead of number restrictions. 

Originally, the lcs was introduced as an operation in the context of inductive learning 
from examples (Cohen et al., 1992), and several papers followed up this lead. The DLs 
considered were mostly sublanguages of Classic which allowed for same-as equalities, i.e., 
expressions like (same-as (madeBy) (model o madeBy) ) . Cohen et al. proposed an lcs 
algorithm for ACAf and a language that allows for concept conjunction and same-as, which 
we will call S. The algorithm for S was extended by Cohen and Hirsh (1994a) to Core- 
CLASSIC, which additionally allows for value restrictions (see (Cohen &; Hirsh, 1994b) for 
experimental results). Finally, Frazier and Pitt (1996) presented an lcs algorithm for full 
Classic. 



1. Observe that if the language already allows for disjunction, we have lcs(Ci, . . . , C n ) = C\ U ■ ■ ■ U C n . In 
particular, this means that, for such languages, the lcs is not really of interest. 
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1.2 Total vs. Partial Attributes 

In most knowledge representation systems, including DLs, functional relationships, here 
called attributes (also called "features" in the literature), are distinguished as a subclass 
of general relationships, at least in part because functional restrictions occur so frequently 
in practice 2 . In the above example, clearly madeBy and model are meant to be attributes, 
thus making unnecessary number restrictions like (and (at-most 1 madeBy) (at-least 
1 madeBy)). In addition, distinguishing attributes helps identify tractable subsets of DL 
constructors: in Classic, coreferences between attribute chains (as in the above examples) 
can be reasoned with efficiently (Borgida Sz Patel-Schneider, 1994), while if we changed to 
roles, e.g., allowed (same-as (repairs) (ownedBy o repairsPaidFor) ) , the subsump- 
tion problem becomes undecidable (Schmidt-Schaufi, 1989). 

Whereas the distinction between roles and attributes in DLs is both theoretically and 
practically well understood, we have discovered that another distinction, namely the one be- 
tween attributes being interpreted as total functions (total attributes) and those interpreted 
as partial functions (partial attributes), has "slipped through the cracks" of contemporary 
research. A total attribute always has a value in "the world out there" , even if we do not 
know it in the knowledge base currently. A partial attribute may not have a value. This 
distinction is useful in practice, since there is a difference between a car possibly, but not 
necessarily, having a CD player, and the car necessarily having a manufacturer (which just 
may not be known in the current knowledge base). The latter is modeled by defining the 
attribute madeBy to be a total attribute. Note that with madeBy being a total attribute, 
every individual in the world of discourse (not only cars) must have a filler for madeBy. 
Since, however, no structural information is provided for fillers of madeBy of non-car indi- 
viduals, all implications drawn about these fillers are trivial. Thus, making madeBy a total 
attribute seems reasonable in this case. A car's CD player, on the other hand, should be 
modeled by a partial attribute to express the fact that cars are not required to have a CD 
player. To indicate that a particular car does have a CD player, one would have to add the 
description (at-least 1 CDplayer) . 

1.3 New Results 

As mentioned above, in conjunction with the same-as constructor, roles and attributes 
behave very differently with respect to subsumption. The main objective of this paper is to 
show that the distinction between total and partial attributes induces significantly different 
behaviour in computing the lcs, in the presence of same-as. More precisely, the purpose of 
this paper is twofold. 

First, we show that with respect to the complexity of deciding subsumption there is no 
difference between partial and total attributes. Borgida and Patel-Schneider (1994) have 
shown that when attributes are total, subsumption of CLASSIC concept descriptions can 
be decided in polynomial time. As shown in the present work, slight modifications of the 
algorithm proposed by Borgida and Patel-Schneider suffice to handle partial attributes. 



2. Readers coming from the Machine Learning community should be aware of the difference between our 
"attributes" (functional roles) and their "attributes" , which are components of an input feature vector 
that usually describes an exemplar. 
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Moreover, these modifications do not change the complexity of the algorithm. Thus, partial 
and total attributes behave very similarly from the subsumption point of view. 

Second, and this is the more surprising result of this paper, the distinction between 
partial and total attributes does have a significant impact on the problem of computing the 
lcs. Previous results on sublanguages of CLASSIC show that if partial attributes are used, 
the lcs of two concept descriptions always exists, and can be computed in polynomial time. 
If, however, only total attributes are involved, the situation is very different. The lcs need 
no longer even exist, and in case it exists its size may grow exponential in the size of the 
given concept descriptions. Nevertheless, the existence of the lcs of two concept descriptions 
can be decided in polynomial time. 

Specifically, in previous work (Cohen et al., 1992; Cohen h Hirsh, 1994a; Frazier Sz 
Pitt, 1996) concerning the lcs computation in CLASSIC, constructions and proofs have been 
made without realizing the difference between the two types of attributes. Without going 
into details here, the main problem for lcs is that merely finite graphs have been employed, 
making the constructions applicable only for the partial attribute case. In addition to fixing 
these problems, this paper also presents the proper handling of inconsistent concepts in the 
lcs algorithm for CLASSIC presented by Frazier and Pitt (1996). 

Although our results about subsumption are not as intriguing, the proofs to show the 
results on the lcs make extensive use of the corresponding subsumption algorithms, which 
is one reason we present them beforehand in this paper. 

Returning to the general differences between the cases of total and partial attributes, 
one could say that the fundamental cause for the differences lies in the same-as constructor, 
whose semantics normally requires that (i) the two chains of attributes each have a value, 
and (ii) that these values coincide. In the case of total attributes, same-as obeys the principle 

C C u \. v implies C^uow\.vow 

where u,v, and w are sequences of total attributes, e.g., (madeBy o model), because condition 
(i) is ensured by the total aspect of all the attributes. In the case of partial attributes, the 
above implication does not hold, because w, and hence uow, is no longer guaranteed to have 
a value, implying that the same-as restriction may not hold. Clearly, this implication affects 
the results of subsumption. As far as lcs is concerned, a certain graph (representing the lcs 
of the two given concepts) may be infinite in the case of total attributes, thus jeopardizing 
the existence of the lcs. 

The more general significance of our result is that knowledge representation language 
designers and users need to explicitly check at the beginning whether they deal with to- 
tal or partial attributes because the choice can have significant effects. Although in some 
situations total attributes are convenient, to guarantee the existence of attributes without 
having to resort to number restrictions, our results show that they can have drawbacks. 
All things considered, requiring all attributes to be total appears to be less desirable. Con- 
cerning CLASSIC, the technical results in this paper support the use of partial attributes 
because these ensure the existence of the lcs and its computation in polynomial time as 
well as the efficient decision of subsumption. Moreover, the current implementation of the 
CLASSIC subsumption algorithm does not require major changes in order to handle partial 
attributes. 
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The outline of this paper is as follows: In the following section, the basic notions neces- 
sary for our investigations are introduced. Then, in the two subsequent sections, subsump- 
tion and lcs computation in CLASSIC with partial attributes is investigated. More precisely, 
in Section 3 we offer a subsumption algorithm for the sublanguage CLASSIC - of CLASSIC, 
which contains all main CLASSlC-constructors; in Section 4, we present an lcs algorithm 
for CLASSIC - concept descriptions, along the lines of that proposed by Cohen and Hirsh 
(1994a), and formally prove its correctness, thereby resolving some shortcomings of previous 
lcs algorithms, which did not handle inconsistencies properly. Finally, Section 5 covers the 
central new result of this paper, i.e., the lcs computation in presence of total attributes. 
For this section, we restrict our investigations to the sublanguage S of CLASSIC - in order to 
concentrate on the changes caused by going from partial to total attributes. Nevertheless, 
we strongly conjecture that all the results proved in this section can easily be extended to 
CLASSIC - and CLASSIC using similar techniques as the one employed in the two previous 
sections. 

2. Formal Preliminaries 

In this section, we introduce the syntax and semantics of the description languages consid- 
ered in this paper and formally define subsumption and equivalence of concept descriptions. 
Finally, the least common subsumer of concept descriptions is specified. 

Definition 1 Let C, 1Z, and A be disjoint finite sets representing the set of concept names, 
the set of role names, and the set of attribute names. The set of all CLASSiC~-concept 
descriptions over C, 1Z, and A is inductively defined as follows: 

• Every element of C is a concept description (concept name, like Car). 

• The symbol T is a concept description (top concept, denoting the universe of all 
objects) . 

• If r E 7Z is a role and n > is a nonnegative integer, then <nr and >nr are concept 
descriptions (number restrictions, like > 10 repairs). 

• If C and D are concept descriptions, then C n D is a concept description (concept 
conjunction). 

• If C is a concept description and r is a role or an attribute, then Vr.C is a concept 
description (value restriction, like VmadeBy.Manufacturer). 

• If k,h > are non-negative integers and ai, . . . , a&, &i, . . . , b^ E A are attributes, then 
a\ o ■ ■ ■ o a fc l b\ o ■ ■ ■ o bh is a concept description (same-as equality, like madeBy J. 
model o madeBy). Note that the two sequences may be empty, i.e., k = or h = 0. 
The empty sequence is denoted by e. 

Often we dispense with o in the composition of attributes. For example, the sequence 
a\ o ■ ■ ■ o at is simply written as a\ ■ ■ ■ a^. Moreover, we will use Wi • • • r n .C as abbreviation 
of W1.W2 • • • Vr„.C, where we have Ve.C in case n = 0, and this denotes C. 

As usual, the semantics of CLASSIC - is defined in a model-theoretic way by means of 
interpretations. 
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Definition 2 An interpretation I consists of a nonempty domain A x and an interpretation 
function x . The interpretation function assigns extensions to atomic identifiers as follows: 

• The extension of a concept name E is some subset E x of the domain. 

• The extension of a role name r is some subset r x of A x x A x . 

• The extension of an attribute name a is some partial function a x from A x to A x , i.e., 
if (x,yi) G a 1 and (2,2/2) G cl x then y\ = yi- 

Given roles or attributes ri, we use (r\ ■ ■ -r n ) x to denote the composition of the binary 
relations rf. If n = then the result is e x , which denotes the identity relation, i.e., e x := 
{(g?, d) I d G A x }. For an individual d G A x , we define r x (d) := {e \ (d,e) G r x }. If the ri's 
are attributes, we say that (r\ ■ ■ ■ r n ) x is denned for d iff {r\ ■ ■ ■ r n ) x (d) / 0; occasionally, 
we will refer to (r\ ■ ■ ■ r n )(d) x as the image of d under (r\ ■ ■ ■ r n ) x (d). 

The extension C x of a concept description C is inductively defined as follows: 

• T x := A x ; 

• (> n r) x := {d G A x \ cardinality({e G A x \ (d, e) G r 1 }) > n}; 

• (< n r) x := {d G A x \ cardinality({e G A x \ (d, e) G r 1 }) < n}; 

• (C n D) x := C x n D x : 

• (Vr.C) x := {d G A x \ r x (d) C C x } where r is a role or an attribute; 

• (ai ■ ■ ■ a,k J. b\ ■ ■ ■ bh) x := {d G A x \ (a\ ■ ■ ■ a,k) x and (61 • • • bh) x are defined for d 

and {a x ■ ■ ■ a k ) x (d) = {b x ■ ■ ■ b h ) x (d)}. 

Note that in the above definition attributes are interpreted as partial functions. Since the 
main point of this paper is to demonstrate the impact of different semantics for attributes, 
we occasionally restrict the set of interpretations to those that map attributes to total 
functions. Such interpretations are called t- interpretations and the attributes interpreted 
in this way are called total attributes in order to distinguish them from partial ones. 

We stress, as remarked in the introduction, that in the definition of (a\ ■ ■ ■ \. b\ ■ ■ ■ b^) 1 , 
a\ - ■ ■ au and b\- ■ ■ bh must be defined on d in order for d to satisfy the same-as restriction. 
Although this is the standard semantics for same-as equalities, one could also think of 
relaxing this restriction. For example, the same-as condition might be specified to hold if 
either both paths are undefined or both images are defined and have identical values. A 
third definition might be satisfied if even just one of the paths is undefined. Each of these 
definitions of the semantics of same-as might lead to different results. However, in this 
paper we only pursue the standard semantics. 

The subsumption relationship between concept descriptions is defined as follows. 

Definition 3 A concept description C is subsumed by the concept description D (C C D 
for short) if and only if for all interpretations X, C x C D x . If we consider only total 
interpretations, we get t-subsumption: C Qt D iff C x C D x for all t-interpretations T. 
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Having defined subsumption, equivalence of concept descriptions is denned in the usual way: 
C = D if and only if C C D and D CC. T-equivalence C =t D is specified analogously. 

As already mentioned in the introduction, the main difference between partial and total 
attributes with respect to subsumption is that u j. v C t u o w J. v o w holds for all attribute 
chains u, v, w, whereas it is not necessarily the case that u \.v \Zuow\.vow. 

Finally, before introducing the lcs operation formally and concluding this section, we 
comment on the expressive power of CLASSIC - , since (syntactically) CLASSIC - lacks some 
common constructors. Although CLASSIC - , as introduced here, does not contain the bottom 
concept _L explicitly, it can be expressed by, e.g., (> 1 r) l~l (< r). We will use _L as an 
abbreviation for inconsistent concept descriptions. Furthermore, primitive negation, i.e., 
negation of concept names, can be simulated by number restrictions. For a concept name 
E one can replace every occurrence of E by (> 1 rg) and the negation -<E of E by (< rg) 
where r# is a new role name. Finally, for an attribute a the following equivalences hold: 
(> n a) = _L for n > 2; (> 1 a) = (a j a); (> a) = T; (< n a) = T for n > 1; and 
(< a) = (Va._L). These show that we do not lose any expressive power by not allowing 
for number restrictions on attributes. Still, full CLASSIC is somewhat more expressive than 
CLASSIC - . This is mainly due to the introduction of individuals (also called nominals) in 
CLASSIC. For the sake of completeness we give the syntax of the full CLASSIC language. 3 
This requires a further set, O, representing the set of individual names. Then we can define 
two additional concept constructors 

• {ei,...,e m }, for individuals G O (enumeration as in {Fall, Summer, Spring}) 

• p : e for a role or attribute p, and an individual e (fills as in currentSeason : Summer). 

In a technical report, Kiisters and Borgida (1999) extend some of the results presented in 
this work to full CLASSIC, in the case when individuals have a non-standard semantics. 

The least common subsumer of a set of concept descriptions is the most specific concept 
subsuming all concept descriptions of the set: 

Definition 4 The concept description D is the least common subsumer (lcs) of the concept 
descriptions C\, . . . , C n (lcs(C\, . . . , C n ) for short) iff i) Ci Q D for all i = 1, . . . , n and it) 
for every D' with that property D CD' '. Analogously, we define lcst(C\, . . . , C n ) using Cf 
instead of C. 

Note that the lcs of concept descriptions may not exist, but if it does, by definition it is 
uniquely determined up to equivalence. In this sense, we may refer to the lcs. 

In the following two sections, attributes are always interpreted as partial functions; only 
in Section 5 do we consider total attributes. 

3. Characterizing Subsumption in classic 

In this section we modify the characterization of t-subsumption for CLASSIC, as proposed 
by Borgida and Patel- Schneider (1994), to handle the case of partial attributes. We do 

3. Even here we are omitting constructs dealing with integers and other so-called "host individuals" , which 
cannot have roles of their own and can only act as role/attribute fillers. 
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so in detail, because the tools used for deciding subsumption are intimately related to the 
computation of lcs. 

T-subsumption in Classic is decided by a multi-part process. First, descriptions are 
turned into description graphs. Next, description graphs are put into canonical form, where 
certain inferences are explicated and other redundancies are reduced by combining nodes 
and edges in the graph. Finally, t-subsumption is determined between a description and a 
canonical description graph. 

In order to "inherit" the proofs, we have tried to minimize the necessary adjustments to 
the specification in (Borgida Sz Patel-Schneider, 1994). For this reason, roughly speaking, 
attributes are treated as roles unless they form part of a same-as equality. (Note that 
attributes participating in a same-as construct must have values!) To some extent, this 
will allow us to adopt the semantics of the original description graphs, which is crucial for 
proofs. However, the two different occurrences of attributes, namely, in a same-as equality 
vs. a role in a value-restriction, require us to modify and extend the definition of description 
graphs, the normalization rules, and the subsumption algorithm itself. 

In the following, we present the steps of the subsumption algorithm in detail. We start 
with the definition of description graphs. 

3.1 Description Graphs 

Intuitively, description graphs reflect the syntactic structure of concept descriptions. A 
description graph is a labeled, directed multigraph, with a distinguished node. Roughly 
speaking, the edges (a-edges) of the graph capture the constraints expressed by same-as 
equalities. The labels of nodes contain, among others, a set of so-called r-edges, which 
correspond to value restrictions. Unlike the description graphs defined by Borgida and 
Patel-Schneider, here the r-edges are not only labeled with role names but also with attribute 
names. (We shall comment later on the advantage of this modification in order to deal with 
partial attributes.) The r-edges lead to nested description graphs, representing the concepts 
of the corresponding value restrictions. 

Before defining description graphs formally, in Figure 1 we present a graph corresponding 
to the concept description Lemon defined in the introduction. We use G(Manufacturer), 
G(Model), as well as G(RepairReport) to denote description graphs for the concept names 
Manufacturer, Model, and RepairReport. These graphs are very simple; they merely consist 
of one node, labeled with the corresponding concept name. In general, such graphs can 
be more complex since a value restriction like Vr.C leads to a (possibly complex) nested 
concept description C. 

Although number restrictions on attributes are not allowed, r-edges labeled with at- 
tributes, like model and madeBy, always have the restriction [0, 1] in order to capture the 
semantics of attributes. Formally, description graphs, nodes, and edges are defined mutually 
recursively as follows: 

Definition 5 A description graph G is a tuple (N,E,no,l), consisting of a finite set N of 
nodes; a finite set E of edges (a-edges,); a distinguished node no € N (root of the graph,); 
and a function I from N into the set of labels of nodes. We will occasionally use the notation 
G. Nodes, G. Edges, and G.root to access the components N, E and n$ of the graph G. 
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An a-edge is a tuple of the form {n\,a, 712) where n\, 112 are nodes and a is an attribute 
name. 

A label of a node is defined to be A. or a tuple of the form (C, H), consisting of a finite 
set C of concept names (the atoms of the node) and a finite set H of tuples (the r-edges of 
the node). Concept names in a description graph stand for atomic concept names and T. 
We will occasionally use the notation n. Atoms and n.REdges to access the components C 
and H of the node n. 

An r-edge is a tuple, (r, m,M, G'), consisting of a role or attribute name, r; a min, m, 
which is a non-negative integer; a max, M , which is a non-negative integer or 00: and a 
(recursively nested) description graph G' . The graph G' will often be called the restriction 
graph of the node for the role r. We require the nodes of G' to be distinct from all the nodes 
of G and other nested description graphs ofG. If r is an attribute, then we require: m = 
and M £ {0,1}. 

Given a description graph G and a node n £ G. Nodes, we define G\ n to be the graph 
(N, E, n, I); G\ n is said to be rooted at n. A sequence p = n^a\a2 • • • a^n^ with k > and 
(nj_i,<2j,nj) £ G. Edges, i = 1, . . . , k, is called path in G from the node no to (p £ G 
for short); for k = the path p is called empty; w = a\ ■ ■ ■ a& is called the label of p (the 
empty path has label e); p is called rooted if no is the root of G. Occasionally, we write 
no<2i • • • a^nk £ G omitting the intermediate nodes. 

Throughout this work we make the assumption that description graphs are connected. 
A description graph is said to be connected if all nodes of the graph can be reached by a 
rooted path and all nested graphs are connected. The semantics of description graphs (see 
Definition 6) is not altered if nodes that cannot be reached from the root are deleted. 

In order to merge description graphs we need the notion of "recursive set of nodes" of 
a description graph G: The recursive set of nodes of G is the union of the nodes of G and 
the recursive set of nodes of all nested description graphs of G. 

Just as for concept descriptions, the semantics of description graphs is defined by means 
of an interpretation I. We introduce a function T which assigns an individual of the domain 
of X to every node of the graph. This ensures that all same-as equalities are satisfied. 



176 



What's in an Attribute? 



Definition 6 Let G = (N,E,no,l) be a description graph and let I be an interpretation. 

An element, d, of A x is in G x , iff there is some total function, T, from N into A x such 
that 

1. d = T(n ); 

2. for all n € N, T(n) G n x ; and 

3. for all (ni,a,ri2) € E we have (T(ni), T(n2)) € a x . 

The extension n x of a node n with label _L is the empty set. An element, d, of A x is in n x , 
where l(n) = (C.H), iff 

1. for all B € C, we have d € B x ; and 

2. for all (r, m, M, G') € H, 

(a) there are between m and M elements, a", of the domain such that (d,d') € r x ; 
and 

(b) a" € G' x for all a" such that {d,d') € r x . 

Cohen and Hirsh (1994a) defined the semantics of description graphs in a different way, 
avoiding the introduction of a total function T. The problem with their definition is, 
however, that it is only well-defined for acyclic graphs, which, for example, excludes same- 
as equalities of the form e J. spouse o spouse, or even p J. p o q. 

The semantics of the graphs proposed by Borgida and Patel-Schneider (1994) is similar 
to Definition 6. However, in that paper a-edges captured not only same-as equalities but 
also all value restrictions on attributes. Still, in the context of partial attributes, we could 
not define the semantics of description graphs by means of a total function T since some 
attributes might not have fillers. Specifying the semantics of description graphs in terms 
of partial mappings T would make the definition even longer. Furthermore, the proofs in 
(Borgida h Patel-Schneider, 1994) would not carry over as easily. Therefore, in order to 
keep T a total function, value restrictions of attributes are initially always translated into r- 
edges. The next section will present the translation of concept descriptions into description 
graphs in detail. 

Having defined the semantics of description graphs, subsumption and equivalence be- 
tween description graphs (e.g., H C G) as well as concept descriptions and description 
graphs (e.g., C C G) is defined in the same way as subsumption and equivalence between 
concept descriptions. 

3.2 Translating Concept Descriptions into Description Graphs 

Following Borgida and Patel-Schneider (1994), a CLASSIC - concept description is turned 
into a description graph by a recursive process. In this process, nodes and description 
graphs are often merged. 

Definition 7 The merge of two nodes, n\ © ni, is a new node n with the following label: 
if ni or ri2 have label _L, then the label of n is _L. Otherwise if both labels are not equal to 
_L ; then n. Atoms = n\. Atoms U ni-Atoms and n.REdges = n\.REdges U ni-REdges. 
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If G\ = (Ni,Ei,ni,li) and G2 = {N2, E2, ri2, I2) o,re two description graphs with disjoint 
recursive sets of nodes, then the merge ofG\ andG2, G := G\®G2 = {N ,E,n$,l), is defined 
as follows: 

1. n := n\ © n 2 ; 

2. N := (Ni U 7V 2 U {n }) \ {ni,n 2 }; 

3. E := {E\V}E2)\n\jn§, 712/710]? i.e., E is the union of E\ and E2 where every occurrence 
of ni,ri2 is substituted by no; 

4- l{n) := l\(n) for all n £ N\ \ {n\\; l(n) := / 2 (ti) for all n £ N2 \ {^2}/ an d K n o) ' ls 
defined by the label obtained by merging n\ and ri2- 

Now, a CLASSIC~-concept description C can be turned into its corresponding description 
graph G(C) by the following translation rules. 

1. T is turned into a description graph with one node no and no a-edges. The only atom 
of the node is T and the set of r-edges is empty. 

2. A concept name is turned into a description graph with one node and no a-edges. The 
atoms of the node contain only the concept name and the node has no r-edges. 

3. A description of the form (> nr) is turned into a description graph with one node and 
no a-edges. The node has as its atoms T and it has a single r-edge (r, n, 00, G(T)) 
where G(T) is specified by the first translation rule. 

4. A description of the form (< n r) is turned into a description graph with one node 
and no a-edges. The node has as its atom T and it has a single r-edge (r, 0, n, G(T)). 

5. A description of the form a\ ■ ■ ■ a p J, b\ ■ ■ ■ b q is turned into a graph with pairwise 
distinct nodes ni, . . . , n p _i,mi, . . . , m g _i, the root mo := no, and an additional node 
n p = m q := n; the set of a-edges consists of (no, a\, ni), (ni, 02,112), ■ ■ ■ , (n p _i, a p , n p ) 
and (mo, 61, mi), (mi, 62, ^2), {m q -\,b q ,m q ), i.e., two disjoint paths which coin- 
cide on their starting point, no, and their final point, n. (Note that for p = the first 
path is the empty path from no to no and for q = the second path is the empty path 
from no to no.) All nodes have T as their only atom and no r-edges. 

6. A description of the form Vr.C, where r is a role, is turned into a description graph 
with one node and no a-edges. The node has the atom {T} and it has a single r-edge 
(r, 0, oo, G(C)). 

7. A description of the form Va.C, where a is an attribute, is turned into a description 
graph with one node and no a-edges. The node has the atom {T} and it has a single 
r-edge (a, 0, 1, G(C)). (In the work by Borgida and Patel-Schneider, the concept 
description Va.C is turned into an a-edge. As already mentioned, this would cause 
problems for attributes interpreted as partial functions when defining the semantics 
by means of T as specified in Definition 6.) 
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8. To turn a description of the form C n D into a description graph, construct G(C) and 
G(D) and merge them. 

Figure 1 shows the description graph built in this way for the concept Lemon of our example. 
It can easily be verified that the translation preserves extensions: 

Theorem 1 A concept description C and its corresponding description graph G{C) are 
equivalent, i.e.,C x = G(C) X for every interpretation X. 

The main difficulty in the proof of this theorem is in showing that merging two description 
graphs corresponds to the conjunction of concept descriptions. 

Lemma 1 For all interpretations 1, if n\ and n<i are nodes, then (n\ © n<i) x = n x n n x ; if 
G\ and G2 are description graphs then (G\ © G2) 1 = G x f) G x . 

The proof of the preceding statement is rather simple and like the one in (Borgida Sz Patel- 
Schneider, 1994). 

3.3 Translating Description Graphs to Concept Descriptions 

Although the characterization of subsumption does not require translating description 
graphs back to concept descriptions, this translation is presented here to show that con- 
cept descriptions and description graphs are equivalent representations of CLASSIC - concept 
descriptions. In subsequent sections, we will in fact need to turn graphs into concept de- 
scriptions. 

The translation of a description graph G can be specified in a rather straightforward 
recursive definition. The main idea of the translation stems from Cohen and Hirsh (1994a), 
who employed spanning trees to translate same-as equalities. A spanning tree of a (con- 
nected) graph is a tree rooted at the same node as the graph and containing all nodes of the 
graph. In particular, it coincides with the graph except that some a-edges are deleted. For 
example, one possible spanning tree T for G in Figure 1 is obtained by deleting the a-edge 
labeled madeBy, whose origin is the root of G. 

Now, let G be a connected description graph and T be a spanning tree for it. Then, 
the corresponding concept description Cq is obtained as a conjunction of the following 
descriptions: 

1. Cg contains (i) a same-as equality v ], v for every leaf n of T, where v is the label 
of the rooted path in T to n; and (ii) a same-as equality v\ o a J. V2 for each a-edge 
(ni,a, 71-2) €E G. Edges not contained in T, where vi is the label of the rooted path to 
Hi in T, i = 1, 2. 

2. for every node n in T, Cg contains a value restriction \/v.C n , where v is the label of 
the rooted path in T to n, and C n denotes the translation of the label of ?2, i.e., G n is 
a conjunction obtained as follows: 

• every concept name in the atoms of n is a conjunct in C n ; 

• for every r-edge (r, m, M, G') of n, C n contains (a) the number restrictions (>mr) 
and (<Mr) (in case r is a role and M / 00) and (b) the value restriction Vr.Cc/, 
where Cg< is the recursively defined translation of G' . 
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In case the set of atoms and r-edges of n is empty, define C n := T. 

Referring to the graph G in Figure 1, Cq contains the same-as equalities model o madeBy \. 
model o madeBy and madeBy J. model o madeBy. Furthermore, if no denotes the root of G, 
Cq has the value restrictions Ve.C„ , Vmodel.T, and Vmodel madeBy. T, where C no corre- 
sponds to Lemon as defined in the introduction, but without the same-as equality. Note 
that, although in this case the same-as equality model o madeBy \. model o madeBy is not 
needed, one cannot dispense with l.(i) in the construction above, as illustrated by the fol- 
lowing example: Without l.(i), the description graph G(a J. a) would be turned into the 
description T, which is not equivalent to a J, a since the same-as equality requires that the 
path a has a value, which may not be the case. 

It is easy to prove that the translation thus defined is correct in the following sense 
(Kusters & Borgida, 1999). 

Lemma 2 Every connected description graph G is equivalent to its translation Cq, i-e-, for 
all interpretations 1: G x = Cq. 

3.4 Canonical Description Graphs 

In the following we occasionally refer to "marking a node incoherent"; this means that the 
label of this node is changed to _L. "Marking a description graph as incoherent" means that 
the description graph is replaced by the graph G(_L) corresponding to _L, i.e., the graph 
consisting only of one node with label _L. 

One important property of canonical description graphs is that they are deterministic, 
i.e., every node has at most one outgoing edge (a-edge or r-edge) labeled with the same 
attribute or role name. Following Borgida and Patel- Schneider (1994), in order to turn a 
description graph into a canonical graph we need to merge a-edges and r-edges. In addition, 
different from their work, it might be necessary to "lift" r-edges to a-edges. 

To merge two a-edges (n,a,n\) and (n, 0,712) in a description graph G, replace them 
with a single new edge (n,a,n') where n' is the result of merging n\ and 712. In addition, 
replace n\ and 712 by n' in all other a-edges of G. 

In order to merge two r-edges (r, s\,k\, G\), (r, S2, G2) replace them by the new r-edge 
(r, max(s\, S2),min(k\, A^), G\ © G2). 

To lift up an r-edge (a,m,M,G a ) of a node n in a concept graph G with an a-edge 
(n,a,n\), remove it from n.REdges, and augment G by adding G a . Nodes to G. Nodes, 
G a . Edges to G. Edges, as well as adding (n,a,G a .Root) to G. Edges. A precondition for 
applying this transformation is that M = 1, or M = and G a corresponds to the graph 
G(_L). The reason for this precondition is that if an r-edge of the form (a, 0, 0, G a ) is lifted 
without G a being inconsistent, the fact that no a-successors are allowed is lost. Normaliza- 
tion rule 5 (see below) will guarantee that this precondition can always be satisfied. 

A description graph G is transformed into canonical form by exhaustively applying the 
following normalization rules. A graph is called canonical if none of these rules can be 
applied. 

1. If some node in G is marked incoherent, mark the description graph as incoherent. 
(Reason: Even if the node is not a root, attributes corresponding to a-edges must always 
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have a value (since they participate in same-as equalities) , and this value cannot belong to the 
empty set.) 

2. If some r-edge in a node has its min greater than its max, mark the node incoherent. 
(Reason: >2rn<lr = ±) 

3. Add T to the atoms of every node, if absent. 

4. If some r-edge in a node has its restriction graph marked incoherent, change its max 
to 0. (Reason: (< r) = Vr._L.) 

5. If some r-edge in a node has a max of 0, mark its restriction graph as incoherent. 
(Reason: See 4 ) 

6. If some r-edge is of the form (r, 0, oo, G') where G' only contains one node with empty 
set of atoms or with the atoms set to {T} and no r-edges, then remove this r-edge. 
(Reason: Vr.T = T.) 

7. If some node has two r-edges labeled with the same role, merge the two edges, as 
described above. (Reason: Vr.C n Vr.D = Vr.(C n D).) 

8. If some description graph has two a-edges from the same node labeled with the 
same attribute, merge the two edges, as described above. (Reason: Va.C n Va.D = 
Va.(CnD).) 

9. If some node in a graph has both an a-edge and an r-edge for the same attribute, then 
"lift up the r-edge" if the precondition is satisfied (see above). (Reason: The value 
restrictions imposed on attributes that participate in same-as equalities must be made explicit 
and gathered at one place similar to the previous to cases.) 

We need to show that the transformations to canonical form do not change the semantics 
of the graph. The main difficulty is in showing that the merging processes and the lifting 
preserve the semantics. The only difference from (Borgida Sz Patel-Schneider, 1994) is that 
in addition to merging r-edges and a-edges we also need to lift up r-edges. Therefore, 
we omit the proofs showing that merging edges preserves extensions. The proofs of the 
following two lemmas are routine and quite similar to the one of Lemma 5. 

Lemma 3 Let G = (N,E,no,l) be a description graph with two mergeable a-edges and let 
G' = (N' , E' , n' , /') be the result of merging these two a-edges. Then, G = G' . 

Lemma 4 Let n be a node with two mergeable r-edges and let n' be the node with these 
edges merged. Then, n 1 = n' 1 for every interpretation 1. 

Lemma 5 Let G = (N, E,no,l) be a description graph with node n and a-edge (n, a, n"). 
Suppose n has an associated r-edge (a,m,M,G a ). Provided that the precondition for lifting 
r-edges is satisfied and that G' = (N' , E' ,n' ,1') is the result of this transformation, then 
G = G'. 

Proof. It is sufficient to show that Gj n = G'J n , since only the label of n is changed in G' 
and only n obtains an additional a-edge, which points to the graph G a not connected to 
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the rest of G' . W.l.o.g. we therefore may assume that n is the root of G, i.e., n = no- Let 
d G G x . Thus, there is a function T from N into A x as specified in Definition 6 and an 
individual e such that d = T(n), e = T(n"), and (d, e) G a x . This implies e G G x . Hence, 
there exists a function T' from G a . Nodes into A x for G a and e satisfying the conditions in 
Definition 6. Since the sets of nodes of G and G a are disjoint, we can define T" to be the 
union of T and T', i.e., T"(m) := T(m) for all nodes m in G and T"(m) := T'(m) for all 
nodes m in G a . Since, by construction, for the additional a-edge (n,a,G a .Root) G E' we 
have (T"(n),T"(G a .Root)) G a 1 , it follows that all conditions in Definition 6 are satisfied 
for d and G', and thus, d G G' x . 

Now let d G G' x . Thus, there is a function T" from N' into A x according to Definition 6. 
Let e := T" \G a .Root) = T"(n"). Let G" be the description graph we obtain from G' by 
deleting the nodes corresponding to G a , which is the same graph as G without the r-edge 
(a, m, M, G a ). If we restrict T" to the nodes of G", then it follows d G G" x . Furthermore, 
restricting T" to the nodes of G a yields e G G x . In particular, G a can not be marked 
incoherent. Then, our precondition ensures M = 1. Thus, since e is the only a-successor of 
g?, we can conclude d G G 1 . □ 

Having dealt with the issue of merging and lifting, it is now easy to verify that "normaliza- 
tion" does not affect the meaning of description graphs. 

Theorem 2 If G is a description graph and G' is the corresponding canonical description 
graph, then G = G' . 

As an example, the canonical description graph of the graph given in Figure 1 is depicted 
in Figure 2. 



3.5 Subsumption Algorithm 

The final part of the subsumption process is checking to see if a canonical description graph 
is subsumed by a concept description. As in Borgida and Patel- Schneider (1994), where 
attributes are total, it turns out that it is not necessary to turn the potential subsumer 
into a canonical description graph. The subsumption algorithm presented next can also be 
considered as a characterization of subsumption. 



182 



What's in an Attribute? 



Algorithm 1 (Subsumption Algorithm) Given a concept description D and descrip- 
tion graph G = (N, E, no, I), subsumes? (_D, G) is defined to be true if and only if one of the 
following conditions hold: 

1. The description graph G is marked incoherent. 

2. D is a concept name or T, and D is an element of the atoms of n$. 

3. D is (> nr) and i) some r-edge of no has r as its role, and min greater than or equal 
to n; or ii) n = 0. 

4- D is (< nr) and some r-edge of no has r as its role, and max less than or equal to n. 

5. D is a\ ■ ■ ■ a n \. b\ ■ ■ ■ b m , and there are rooted paths with label a\- ■ ■ a n and b\ ■ ■ ■ b m 
in G ending at the same node. 

6. D is Vr.C, for a role r, and either (i) some r-edge of n$ has r as its role and G' 
as its restriction graph with subsumes? ( C, G'); or (ii) subsumes?(C,G(T)). (Heason: 
Vr.T = T.) 

7. D is\/a.C, for an attribute a, and (i) some a-edge of G is of the form (no, a, n'), and 
subsumes?(C, (N,E,n' ,/)); or (ii) some r-edge of no has a as its attribute, and G' as 
its restriction graph with subsumes?(C, G'); or (Hi) subsumes?(C,G(T j). 

8. D is E n F and both subsumes?(E,G) and subsumes?(F,G) are true. 

There are only two differences between this algorithm and the one for total attributes pre- 
sented by Borgida and Patel-Schneider (see also Algorithm 2). First, in the partial attribute 
case, given D = Va.C, one needs to look up the value restriction either in some a-edge or 
some r-edge of G, since attributes can label both a-edges and r-edges. (In the total attribute 
case, attributes can only label a-edges so that examining r-edges was not necessary.) The 
second and most important distinction is the treatment of same-as equalities. As shown in 
the above algorithm, with D = a\ ■ ■ ■ a n \. b\ ■ ■ ■ b m one only needs to check whether there 
exist two paths labeled v := a\ ■ ■ ■ a n and w :=b\ - ■ -b m leading the same node in G. In the 
total attribute case, however, it suffices if there exist prefixes v' and w' of v and w with this 
property, as long as the remaining suffixes are identical. 

Soundness and completeness of this algorithm is stated in the following theorem. 

Theorem 3 Let C, D be CLASSIC - descriptions. Then, C Q D iff subsumesl(D,Gc), 
where Gc is the canonical form ofG(C). 

The soundness of the subsumption algorithm, i.e., the if direction in the theorem stated 
above, is pretty obvious. As in (Borgida &: Patel-Schneider, 1994), the main point of the 
only-if direction (proof of completeness) is that the canonical graph Gc is deterministic, 
i.e., from any node, given a role or attribute name r, there is at most one outgoing r-edge 
or a-edge with r as label. We point the reader to (Borgida &: Patel-Schneider, 1994) for 
the proof, since it is almost identical to the one for total attributes already published there. 
These proofs reveal that, for the if direction of Theorem 3, description graphs need not be 
normalized. Thus, one can also show: 
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Remark 1 Let G be some (not necessarily normalized description graph) and let D be a 
CLASSIC - concept description. Then, subsumes? (D , G) implies G C D. 

Borgida and Patel-Schneider argue that the canonical description graph G of a concept 
description C can be constructed in time polynomial in the size of C. Furthermore, Al- 
gorithm 1 runs in time polynomial in the size of G and D. It is not hard to see that the 
changes presented here do not increase the complexity. Thus, soundness and completeness 
of the subsumption algorithm provides us with the following corollary. 

Corollary 1 Subsumption for CLASSIC - concept descriptions C and D, where attributes 
are interpreted as partial functions, can be decided in time polynomial in the size of C and 
D. 

4. Computing the LCS in classic 

In this section, we will show that the lcs of two CLASSIC - concept descriptions can be stated 
in terms of a product of canonical description graphs. A similar result has been proven by 
Cohen and Hirsh (1994a) for a sublanguage of CLASSIC - , which only allows for concept 
names, concept conjunction, value restrictions, and same-as equalities. In particular, this 
sublanguage does not allow for inconsistent concept descriptions (which, for example, can be 
expressed by conflicting number- restrictions). Furthermore, the semantics of the description 
graphs provided by Cohen and Hirsh restricts the results to the case when description graphs 
are acyclic. This excludes, for example, same-as equalities of the form e J. spouse o spouse. 

In the following, we first define the product of description graphs. Then, we show that 
for given concept descriptions C and D, the lcs is equivalent to a description graph obtained 
as the product of Gc and Gd- Our constructions and proofs will be quite close to those in 
(Cohen & Hirsh, 1994a). 

4.1 The Product of Description Graphs 

A description graph represents the constraints that must be satisfied by all individuals in the 
extension of the graph. Intuitively, the product of two description graphs is the intersection 
of these constraints — as the product of finite automata corresponds to the intersection of the 
words accepted by the automata. However, in the definition of the product of description 
graphs special care has to be taken of incoherent nodes, i.e., nodes labeled with _L. Also, 
since attributes may occur both in r-edges and a-edges, one needs to take the product 
between restriction graphs of r-edges, on the one hand, and the original graphs G\ or G2 
(rooted at certain nodes), on the other hand. 

Definition 8 Let G\ = (N\,Ei,n\,l\) and G2 = (N2, E2, 712, h) be two description graphs. 
Then, the product G := G\ x G2 '■= (N,E,no,l) of the two graphs is recursively defined as 
follows: 

1. N := JVi x N 2 ; 

2. n := (ni,n 2 ); 

3. E :={((n, n'), a, (m, m')) \ (n,a,m) E E\ and (n' ,a,m') E E2}; 
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4- Let n € N\ and n' E N 2 . If l\{n) = _L, then let l((n,n')) := ^(n 1 ) and, analogously, 
if h( n ') = -L; ^en l((n,n')) := /i(n). Otherwise, for l\{n) = (Si, Hi) and ^(n') = 
(5*2, -H2)? define l((n,n')) := (S,H) where 

(a) S:=Sif) S 2 ; 

(b) H := 

{(r,min(p 1 ,p 2 ),max(q 1 ,q 2 ),G' 1 x G' 2 ) | (r, »i , gi , G'j ) € #1, (r,p 2 ,q 2 ,G' 2 ) E # 2 } U 
{(a,0,l,Gi| m x G 2 ) I (n,a,m) 6 Si, (a,p 2 ,q 2 ,G' 2 ) G # 2 } U 
{(a,0,l,Gi x G 2 , m ) I (a,pi,gi,Gi) G Hi, (n',a,m) G £ 2 }. 

According to this definition, if in the tuple (n,n') some node, say n, is incoherent, then 
the label of (n, n') coincides with the one for n'. The reason for defining the label in this 
way is that Zcs(_L, C) = C for every concept description C. This has been overlooked by 
Prazier and Pitt (1996), thus making their constructions and proofs only hold for concept 
descriptions that do not contain inconsistent subexpressions. 

Note that G, as defined here, might not be connected, i.e., it might contain nodes that 
cannot be reached from the root n$. Even if Gi and G2 are connected this can happen 
because all tuples (711,712) belong to the set of nodes of G regardless of whether they are 
reachable from the root or not. However, as already mentioned in Section 3.1 we may 
assume G to be connected. 

Also note that the product graph can be translated back into a CLASSIC - concept 
description since the product of two description graphs is once again a description graph. 

4.2 Computing the LCS 

We now prove the main theorem of this subsection, which states that the product of two 
description graphs is equivalent to the lcs of the corresponding concept descriptions. 

Theorem 4 Let C\ and C2 be two concept descriptions, and let Gi and G 2 be corresponding 
canonical description graphs. Then, Gg iX g 2 = lcs(Ci,C2)- 

Proof. Let G := G\ x G2. We will only sketch the proof showing that Cq subsumes C\ and, 
by symmetry, also C2 (see (Kusters & Borgida, 1999) for details). By construction, if there 
are two rooted paths to a common node in G, then G\ has corresponding paths leading to 
the same node as well. Thus, by Theorem 3, the same-as equalities in Cq subsume the ones 
in C\. Now, let T be a spanning tree of G, (7711,7712) be a node in G, and v be the label of 
the rooted path in T to (mi, 7712). Then, by construction it follows that there exists a rooted 
path in Gi to ttii labeled v. Furthermore, a rather straightforward inductive proof shows 
that the concept description E corresponding to the label of (7711,7712) subsumes Gi| TOl . 
This implies \/v.E □ G\. As a result, we can conclude G □ G\. 

The more interesting part of the proof is to show that Cq is not only a common subsumer 
of Ci and C2, but the least common subsumer. 

We now show by induction over the size of D, C\, and C2 that if D subsumes C\ and 
C2, then D subsumes Cq- We distinguish different cases according to the definition of 
"subsumes!". Let Gi = (N\,Ei,n\,l\) be the canonical description graph of Ci, G2 = 
(N2, E2, 7i2, h) be the canonical description graph of G2, and G = (N, E, no, I) = G\ x G2. 
In the following, we assume that C\ C D and G2 C D; thus, subsumes! (D,Gi) and 
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subsumes? (D , G 2 ) ■ We show that subsumesl (D , G) . Then, Remark 1 implies and 
thus, Cg E D. Note that one cannot use Theorem 3 since G might not be a canonical 
description graph. 

1. If G is incoherent, then there is nothing to show. 

2. If D is a concept name, T, or a number-restriction, then by definition of the label of 
no it is easy to see that subsumes? {D^G). 

3. If D is v 4- w, then there exist nodes m\ in G\ and mi in G 2 such that there are two 
paths from n\ to m\ with label v and w, respectively, as well as two paths from 712 to 
m 2 with label v and w. Then, by definition of G it is easy to see that there are two 
paths from no = (ni,n2) to (mi, 7712) with label v and w, respectively. This shows 
subsumes? (D , G). 

4. If D is Vr.C, r a role or attribute, then one of several cases applies: 

(i) ni and n2 have r-edges with role or attribute r, and restriction graphs G[ and G' 2 , 
respectively, such that subsumes? (C^G^) and subsumes? (C,G' 2 ); 

(ii) without loss of generality, n\ has an a-edge pointing to m\ with attribute r, such 
that subsumes? '(C,G\), where G[ := Gi| mi ; and n2 has an r-edge with restriction 
graph G' 2 such that subsumes?(C, G 2 ). 

In both cases (i) and (ii), subsumes?(C, G[ x G' 2 ) follows by induction. Furthermore, 
by definition of G there is an r-edge with role r and restriction graph G[ x G' 2 for no- 
This implies subsumes? (D,G). 

(iii) ni and n2 have a-edges with attribute r leading to nodes mi and m,2, respec- 
tively. Then, subsumes?(C, Gi| mi ) and subsumes?(C, G2\ m2 )- By induction, we know 
subsumes? (C,Gi\ mi x G 2 \ m2 ). It is easy to see that G|( miim2 ) = Gi| TOl x G 2 \ m2 . Fur- 
thermore, by definition there is an a-edge with attribute r from (ni,n2) to (mi,r/i2) 
in G. This shows subsumes? {D^G). 

(iv) (without loss of generality) ni has no r-edge and no a-edge with role or attribute 
r. This implies subsumes? (C, G(T)) , which also ensures subsumes? (D,G). 

5. If D is E\1F, then by definition of the subsumption algorithm, subsumes? (E, G\) and 
subsumes? (E,G2) hold. By induction, we have subsumes? (E , G) , and analogously, 
subsumes? (F,G). Thus, subsumes?(D, G). □ 

As stated in Section 3.5, a canonical description graph for a CLASSIC - concept description 
can be computed in time polynomial in the size of the concept description. It is not hard 
to verify that the product of two description graphs can be computed in time polynomial in 
the size of the graphs. In addition, the concept description corresponding to a description 
graph can be computed in time polynomial in the size of the graph. Thus, as a consequence 
of Theorem 4 we obtain: 

Corollary 2 The les of two CLASSIC - concept descriptions always exists and can be com- 
puted in time polynomial in the size of the concept descriptions. 
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Figure 3: The canonical description graph for Di, without node labels. 



As intimated in (Cohen et al., 1992), this statement does not hold for sequences of concept 
descriptions. Intuitively, generalizing the lcs algorithm to sequences of, say, n concept de- 
scriptions, means computing the product of n description graphs. The following proposition 
shows that the size of such a product graph may grow exponentially in n. Thus, the lcs 
computed in this way grows exponentially in the size of the given sequence. However, this 
does not imply that this exponential blow-up is unavoidable. There might exist a smaller, 
still equivalent representation of the lcs. Nevertheless, we can show that the exponential 
growth is inevitable. 

Proposition 1 For all integers n > 2 there exists a sequence D\, . . . , D n of CLASSIC - 
concept descriptions such that the size of every CLASSIC - concept description equivalent to 
lcs(D\, . . . ,D n ) is at least exponential in n where the size of the Di s is linear in n. 

Proof. As in Cohen et al. (1992), for a given n, define the concept descriptions Di as 
follows: 

Di := I - ! (e l cij) l~l I - ! (aj J, a^aj) l~l (e J. a^ai) 

where ai, . . . , a n denote attributes. The canonical description graph for Di is depicted in 
Figure 3. Using Algorithm 1 it is easy to see that Di C v j. w iff the number of a/s in v and 
the number of a/s in w are equal modulo 2 where v,w are words over {a\, . . . ,a n }. This 
implies that 

Di, . . . , D n C v J, w iff for all 1 < i < n the number of a/s in v and (1) 

the number of aj's in w are equal modulo 2. 

Let s C {1, . . . ,n} be a non-empty set. We define v s := ■ ■ ■ a,i k where i\ < ■ ■ ■ < i^ 
are the elements of s and w s := ai^a^ 3 ■ ■ ■ a^ 3 with Oj 3 := ajajOj. Now let E be the lcs 
of D\, . . . , D n , and let Ge be the corresponding canonical description graph with root no- 
From (1) we know that E C v s J, w s for every s C {1, . . . ,n}. Algorithm 1 implies that 
the paths from no in Ge labeled v s and w s exist and that they lead to the same node q s . 
Assume there are non-empty subsets s, t of {1, . . . , n}, s / i, such that q s = q t . This would 
imply E Q v s I Vt in contradiction to (1). Thus, s / t implies q s / qt. Since there are 
2" — 1 non-empty subsets of {1, . . . , n}, this shows that Ge contains at least 2" — 1 nodes. 
The fact that the size of Ge is linear in the size of E completes the proof. □ 

This proposition shows that algorithms computing the lcs of sequences are necessarily worst- 
case exponential. Conversely, based on the polynomial time algorithm for the binary lcs 
operation, an exponential time algorithm can easily be specified employing the following 
identity lcs{D\, . . . , D n ) = lcs(D n , lcs(D n -i,lcs(- ■ ■ lcs(D2, D\) ■ ■ •). 
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Corollary 3 The size of the les of sequences of CLASSIC - concept descriptions can grow 
exponentially in the size of the sequences and there exists an exponential time algorithm for 
computing the les. 

5. The LCS for Same-as and Total Attributes 

In the previous sections, attributes were interpreted as partial functions. In this section, 
we will present the significant changes in computing the les that occur when considering 
total functions instead of partial functions. More precisely, we will look at a sublanguage 
S of CLASSIC - that only allows for concept conjunction and same-as equalities, but where 
we have the general assumption that attributes are interpreted as total functions. 

We restrict our attention to the language S in order to concentrate on the changes 
caused by going from partial to total functions. We strongly conjecture, however, that the 
results represented here can easily be transfered to CLASSIC - by extending the description 
graphs for S as in Section 4. 

First, we show that in S the lcs^ of two concept descriptions does not always exist. 
Then, we will present a polynomial decision algorithm for the existence of an lcsj of two 
concept descriptions. Finally, it will be shown that if the lcsj of two concept descriptions 
exists, then it might be exponential in the size of the given concept descriptions and it can 
be computed in exponential time. 

In the sequel, we will simply refer to the lcs^ by les. Since throughout the section 
attributes are always assumed to be total, this does not lead to any confusion. 

Once again, it may be useful to keep in mind that for total (though not partial) attributes 
we have (u \, v) C 4 [uow\.vow) for any u,w,v E A*, where A* is the set of finite words 
over A, the finite set of attribute names. Indeed, all the differences between partial and 
total attributes shown in this section finally trace back to this property. 

5.1 The Existence of the LCS 

In this subsection, we prove that the les of two concept descriptions in S does not always 
exist. Nevertheless, there is always an infinite representation of the les, which will be used 
in the next subsection to characterize the existence of the les. 

To accomplish the above, we return to the graph-based characterization of t-subsumption 
proposed by Borgida and Patel- Schneider (1994), and modified for partial attributes in Sec- 
tion 3. For a concept description C, let Gc denote the corresponding canonical description 
graph, as defined in Section 3.4. Its semantics is specified as in Section 3.1, although now 
the set of interpretations is restricted to allow attributes to be interpreted as total functions 
only. 

Since S contains no concept names and does not allow for value-restrictions, the nodes 
in Gc do not contain concept names and the set of r-edges is empty. Therefore, Gc can 
be defined by the triple (N,E,no) where N is a finite set of nodes, E is a finite set over 
N x A x N, and no is the root of the graph. 

As a corollary of the results of Borgida and Patel-Schneider, subsumption C C t D of 
concept descriptions C and D in S can be decided with the following algorithm, which also 
provides us with a characterization of t-subsumption. 
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Algorithm 2 Let C, D be concept descriptions in S, and Gc = (N, E, no) be the canonical 
description graph of C. Then, subsumesp.{D,Gc) is defined to be true if and only if one 
of the following conditions hold: 

1. D is v \. w and there are words v',w',u E A* such that v = v'u and w = w'u, and 
there are rooted paths in Gc labeled v' and w' , respectively, ending at the same node. 

2. D is D\ n Di and both subsumest?(Di,Gc) and subsumest?(D2,Gc) are true. 

Apart from the additional constructors handled by Algorithm 1, Algorithm 2 only differs 
from Algorithm 1 in that, for total attributes, as considered here, it is sufficient if prefixes 
of rooted paths v and w lead to a common node, as long as the remainder in both cases is 
the same path. 

Theorem 5 There are concept descriptions in S such that the les of these concept descrip- 
tions does not exist in S. 

This result corrects the statement of Cohen et al. (1992) that the lcs always exists, a 
statement that inadvertently assumed that attributes were partial, not total. 

As proof, we offer the following 5-concept descriptions, which are shown not to have an 

lcs: 

Co := a J, 6, 

-Do := & i o-c l~l b | be l~l ad | bd. 

The graphs for these concepts are depicted in Figure 4. 

The following statement shows that an lcs E of Co and -Do would satisfy a condition 
which does not have a "regular structure". This statement can easily be verified using 
Algorithm 2. 

E Cf v \. w iff v = w or there exists a nonnegative integer n and u E 
A* such that v = ac n du and w = bc n du or vice versa. 

Given this description of the lcs of Co and -Do, one can show, again, by employing Algo- 
rithm 2, that no finite description graph can be equivalent to E. However, we omit this 
elementary proof here, because the absence of the lcs also follows from Theorem 6, where 
infinite graphs are used to characterize the existence of an lcs. Note that in the partial 
attribute case, the lcs of Co and Dq is equivalent toa|an5|6, a result that can be 
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obtained by the lcs algorithm presented in the previous section. The corresponding (fi- 
nite) description graph consists of a root and two additional nodes, where the root has two 
outgoing edges leading to the two nodes and labeled a and 6, respectively. 

To state Theorem 6, we first introduce infinite description graphs and show that there 
always exists an infinite description graph representing the lcs of two 5-concept descriptions. 

An infinite description graph G is defined, like a finite graph, by a triple (N,E,no) 
except that the set of nodes N and the set of edges E may be infinite. As in the finite case, 
nvn' E G means that G contains a path from n to n' labeled with the word v E A*. The 
semantics of infinite graphs is defined as in the finite case. Furthermore, infinite graphs are 
translated into concept descriptions as follows: take an (infinite) spanning tree T of G, and, 
as in the finite case, for every edge of G not contained in it, add to Cq a same-as equality. 
Note that in contrast to the partial attribute case, Cq need not contain same-as equalities of 
the form v 1 v since, for total attributes, v 1 v = T. Still, Cq might be a concept description 
with an infinite number of conjuncts (thus, an infinite concept description). The semantics 
of such concept descriptions is defined in the obvious way. Analogously to Lemma 2, one 
can show that an (infinite) graph G and its corresponding (infinite) concept description Cq 
are equivalent, i.e., Cq = G. 

We call an (infinite) description graph G deterministic if, and only if, for every node n 
in G and every attribute a E A there exists at most one a-successor for n in G. The graph 
G is called complete if for every node n in G and every attribute a E A there is (at least) 
one a-successor for n in G. Clearly, for a deterministic and complete (infinite) description 
graph, every path is uniquely determined by its starting point and its label. 

Algorithm 2 (which deals with finite description graphs Gc) can be generalized to de- 
terministic and complete (infinite) description graphs G in a straightforward way. To see 
this, first note that a (finite) description graph coming from an 5-concept description is 
canonical iff it is deterministic in the sense just introduced. Analogously, a deterministic 
infinite graph can be viewed as being canonical. Thus, requiring (infinite) graphs to be 
deterministic satisfies the precondition of Algorithm 2. Now, if in addition these graphs are 
complete, then (unlike the condition stated in the subsumption algorithm) it is no longer 
necessary to consider prefixes of words because a complete graph contains a rooted path 
for every word. More precisely, if v' and w' lead to the same node, then this is the case for 
v = v'u and w = w'u as well, thus making it unnecessary to consider the prefixes v' and w' 
of v and w, respectively. Summing up, we can conclude: 

Corollary 4 Let G = (N,E,no) be a deterministic and complete (infinite) description 
graph and v,w E A*. Then, 

G v J, w iff n$vn E G and uqwu E G for some node n. 

We shall construct an (infinite) graph representing the lcs of two concept descriptions in S 
as the product of the so-called completed canonical graphs. This infinite representation of 
the lcs will be used later to characterize the existence of an lcs in 5, i.e., the existence of a 
finite representation of the lcs. 

We now define the completion of a graph. Intuitively, a graph is completed by iteratively 
adding outgoing a-edges labeled with an attribute a for every node in the graph that does 
not have such an outgoing a-edge. This process might extend a graph by infinite trees. As 
an example, the completion of Gc (cf. Figure 4) is depicted in Figure 5 with A = {a, 6, c, d}. 
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Formally, completions are denned as follows: Let G be an (infinite) description graph. 
The graph G' is an extension of G if for every node n in G and for every attribute a £ A 
such that n has no outgoing edges labeled a, a new node m n a is added, as well as an edge 
(n, a, m nA ). Now, let G°, G 1 , G 2 , . . . be a sequence of graphs such that G° = G and G %+1 is 
an extension of G l , for i > 0. If G l = (iVj, no), then 

:=(|J^,(J^,no) 

t>0 i>0 

is called the completion of G. By construction, G°° is a complete graph. Furthermore, if 
G is deterministic, then G°° is deterministic as well. Finally, it is easy to see that a graph 
and its extension are equivalent. Thus, by induction, G°° =t G. 

The nodes in Uj>i ^ii i- e -? the nodes in G°° that are not in G, are called tree nodes; the 
nodes of G are called non-tree nodes. By construction, for every tree node t in G°° there is 
exactly one direct predecessor oft in G°°, i.e., there is exactly one node n and one attribute 
a such that (n,a,t) is an edge in G°°; n is called a-predecessor of i. Furthermore, there is 
exactly one youngest ancestor n in G of a tree node t in G°°; n is the youngest ancestor of 
t if there is a path from n to t in G°° which does not contain non-tree nodes except for n. 
Note that there is only one path from n to t in G°°. Finally, observe that non-tree nodes 
have only non-tree nodes as ancestors. 

Note that the completion of a canonical description graph is always complete and de- 
terministic. 

In the sequel, let C, D be two concept descriptions in S, Gc = (Nc, Ec,nc), Go = 
(NdiEbiIIb) be their corresponding canonical graphs, and G^ be the completions of 
Gc, Gd- The products G := Gc x G\d and G^ := Gq x G^ are specified as in Definition 1. 
As usual, we may assume G and G^ are connected, i.e., they only contain nodes that are 
reachable from the root (nc,Bo); otherwise, one can remove all those nodes that cannot be 
reached from the root without changing the semantics of the graphs. 

We denote the product G£? x G^ by G^ instead of G°° (or G^) because otherwise 
this graph could be confused with the completion of G. In general, these graphs do not 
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coincide. As an example, take the products Gc xGd and G^? x GJ) (see Figure 4 for the 
graphs Gc and Gd )- The former product results in a graph that consists of a root with 
two outgoing a-edges, one labeled a and the other one labeled b. (As mentioned before, this 
graph corresponds to the lcs of Co and Dq in the partial attribute case.) The product of 
the completed graphs, on the other hand, is a graph that is obtained as the completion of 
the graph depicted in Figure 6 (the infinite trees are omitted for the sake of simplicity) . 

As an easy consequence of the fact that Gc = G^? and Corollary 4, one can prove the 
following lemma. 

Lemma 6 C C t v 1 w iff ncvn € Gg? and ncwn € Gg? for a node n in Gg?. 
But then, by the construction of we know: 

Proposition 2 C C< v 1 w and D C t v 1 w iff (nc,nu)vn € G^ and (nc,nu)wn € 
for a node n in G^. 

In particular, G^ represents the lcs of the concept descriptions C and D in the following- 
sense: 

Corollary 5 The (infinite) concept description C G x corresponding to G^ is the lcs of C 
and D, i.e., i) C,D Q t C r x and it) C,D Q t E' implies C r x Qt E' for every S-concept 
description E . 

5.2 Characterizing the Existence of an LCS 

Let C, D be concept descriptions in S and let the graphs Gc, Go, G, Gq", G^, and G^ 
be defined as above. 

We will show that G^ not only represents a (possibly infinite) lcs of the 5-concept 
descriptions G and D (Corollary 5), but that G^ can be used to characterize the existence 
of a finite lcs. The existence depends on whether G^ contains a finite or an infinite number 
of so-called same-as nodes. 

Definition 9 A node n of an (infinite) description graph H is called a same-as node if 
there exist two direct predecessors of n in H. (The a-edges leading to n from these nodes 
may be labeled differently.) 



c c c 




c c c 



Figure 6: A subgraph of G£? x G' 
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For example, the graph depicted in Figure 6 contains an infinite number of same-as nodes. 
We will show that this is a sufficient and necessary condition for the lcs of Co and Do not 
to exist. 

It is helpful to observe that same-as nodes in have one of the forms (g,f), (/, i), 
and (i, /), where g and / are non-tree nodes and t is a tree node. There cannot exist a 
same-as node of the form (ii, *2), where both t\ and t 2 are tree nodes, since tree nodes 
only have exactly one direct predecessor, and thus (ti , ^2) does. Moreover, if G^ has an 
infinite number of same-as nodes, it must have an infinite number of same-as nodes of the 
form (/, t) or (i, /), because there only exist a finite number of nodes in G^ of the form 
(g, /). For this reason, in the following lemma we only characterize same-as nodes of the 
form (/, t). (Nodes of the form (i, /) can be dealt with analogously.) To state the lemma, 
recall that with nQU-ni-vn 2 E H, for some graph H, we describe a path in H labeled uv 
from no to 712 that passes through node 71 1 after u (i.e., hquhi E H and n\vn 2 E H); this 
is generalized the obvious way to interpret noi/r7ii-tt2-7i2 -1*3713 E H. 
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Figure 7: same-as nodes in G 



Lemma 7 Given a node f in Gc and a tree-node t in G^, the node n = (f,t) in G^ is a 
same-as node iff 

• there exist nodes (/ii,Po); (^2,Po) in G, hi / hi; 

• there exist nodes {e\,qo), (e2,<7o) in G^, where e\, ei are distinct nodes in Gc and 
qo is a node in G^; and 

• there exists an attribute a E A and v,w,x E A*, v 7^ w, where A is the set of attributes 
in C, 

such that 

(n c , n D )v-(h 1 ,p )-x-(ei,q )-a(f, t) and {n c ,n D )w-(h 2 ,p )-x-(e 2 , qo)-a(f, t) 

are paths in G^ (see Figure 7). For the direct successors (/i^pg) and (h' 2 ,p' ) of (hi,po) 
and (h 2 ,po) in this paths, we, in addition, require p' to be a tree node in G^. 4 

4. Note that since G£, is deterministic, the successors of (hi,po) and (/12,/Jo) in the two paths must in fact 
be of the form (-,pd). 
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Proof. The if direction is obvious. We proceed with the only-if direction and assume that 
n is a same-as node in G^,. Let po be the (uniquely determined) youngest ancestor oft in 
G^. In particular, po is a node in Gp and there exists pox-qo-at in G^ with a G A and 
x G A* such that the successor of po in this path is a tree node in Gd- 

Since n is a same-as node and t can only be reached via qo and the attribute a, there 
must exist e\, in Gc, &\ / e2, with (ei,qo)a(f,t),(e2,qo)a(f,t) G G£-,. Since G^ is 
connected, there are paths from (nc,no) to (ei,go) and (e2,qo)- Every path from ud to qo 
must pass through po and the suffix of the label of this path is x. Consequently, there exist 
nodes /ii,/i2 in Gc such that (hi,po)x-(ei,qo)-a(f,t) and (h2,Po)x-(e2,qo)-a(f,t) are paths 
in In particular, xa is a label of a path from hi to / in Gc, and the label xa only 

consists of attributes contained in G. If hi = /12, then this, together with the fact that Gc 
is deterministic, would imply ei = e2- Hence, hi / /i2- Let v, w be the labels of the paths 
from (nc,nu) to (/ti,po) and (/i2,po), respectively. As G is deterministic and hi / /12, it 
follows that v ^ w. □ 

The main results of this section is stated in the next theorem. As a direct consequence of 
this theorem, we obtain that there exists no lcs in S for the concept descriptions Go and 
Dq of our example. 

Theorem 6 The lcs of C and D exists iff the number of same-as nodes in G^ is finite. 

Proof. We start by proving the only-if direction. For this purpose, we assume that G^ 
contains an infinite number of same-as nodes and show that there is no (finite) lcs for G 
and D in S. 

As argued before, we may assume that G^ contains an infinite number of same-as nodes 
of the form (/, t) or (t, /), where t is a tree node and / is a non-tree node. More precisely, 
say G£q contains for every i > 1 nodes ni = {fi,U) such that fi is a node in Gc and tj is 
a tree node in Gg\ According to Lemma 7, for every same-as node ni there exist nodes 
h>i,i, h-2j, eij, e2j in Gc, Po,i in Gd, and go,i in as well as ai G A and X{ G A* with the 
properties required in Lemma 7. 

Since Gc and Gd are finite description graphs, the number of tuples of the form 
h>i,i, h>2,i, e i,i, e 2,i, fi, a i i s finite. Thus, there must be an infinite number of i's yielding 
the same tuple hi, /12, ei, e2, /, a. In particular, hi / h2 and ei / e2 are nodes in Gc and 
there is an infinite number of same-as nodes of the form rij = (/, ti^). Finally, as in the 
lemma, let v, w be the label of paths (in G) from (ucUd) to (/ii,Po) and (/i2 5 Po)- 

Now, assume there is an lcs E of G and .D in 5. According to Corollary 5, E 1 =t C n x . 
Let Ge be the finite canonical graph for E with root n . By Proposition 2 and Lemma 7 
we know E Q t vx^a | wxia. From Algorithm 2 it follows that there are words v', w', and u 
such that vxia = v'u and lurr^a = w'u, where the paths in Ge starting from n' labeled v', 
w' lead to the same node in Ge- 

If u ^ e, then u = u'a for some word u' . Then, Algorithm 2 ensures E C t vxi J, wx{. 
However, by Lemma 7 we know that the words vx^ and wx-i lead to different nodes in 
G£j, namely, (ei,qoj) and (e2,go,i)i which, with Proposition 2, leads to the contradiction 
E = G^q %t vxi 4 wxi. Thus, u = e. 

As a result, for every i > 1 there exists a node % in Ge such that n'vxiaqi and n'wxiaqi 
are paths in G^. Because G# is a finite description graph, there exist i, j > 1, i / j, with 
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qi = qj. By Algorithm 2, this implies E C 4 vxia J, wxjd. On the other hand, the path in 
starting from (nc^n) with label vxia leads to the node rij and the one for wxjd leads 
to rij. Since ni / raj, Proposition 2 implies E % t vxi<i j. wxja, which is a contradiction. To 
sum up, we have shown that there does not exist an lcs for C and D in S. 

This shows that there is no lcs of C, D in S which completes the proof of the only-if 
direction. 

We now prove the if direction of Theorem 6. For this purpose, we assume that has 
only a finite number of same-as nodes. Note that every same-as node in G^ has only a 
finite number of direct predecessors. To see this, two cases are distinguished: i) a node of 
the form (91,52) hi G has only predecessors in G; ii) if t is a tree node and g a non-tree node, 
then a predecessor of (</, t) in is of the form (g',t') where t' is the unique predecessor 
(tree or non-tree node) of t and g' is a non-tree node. Since the number of nodes in Gc 
and Gp is finite, in both cases we only have a finite number of predecessors. But then, the 
spanning tree T of G^ coincides with G^ except for a finite number of edges because, if T 
does not contain a certain edge, then this edge leads to a same-as node. As a result, C G x 
is an 5-concept description because it is a finite conjunction of same-as equalities. Finally, 
Corollary 5 shows that C G x is the lcs of C and D. □ 

If v l w is a conjunct in C G x , then v and w lead from the root of G^ to a same-as node. 
As mentioned before, same-as nodes are of the form (/, g), (/, t), or (i, /), where < is a tree 
node and /, g are non-tree nodes. Consequently, v and w must be paths in Gc or Go- 
Thus, they only contain attributes occurring in C or D. 

Corollary 6 If the lcs of two concept description C and D in S exists, then there is a 
concept description in S only containing attributes occurring in C or D that is equivalent 
to the lcs. 

Therefore, when asking for the existence of an lcs, we can w.o.l.g. assume that the set of 
attributes A is finite. This fact will be used in the following two subsections. 

5.3 Deciding the Existence of an LCS 

From the following corollary we will derive the desired decision algorithm for the existence 
of an lcs of two concept descriptions in S. To state the corollary we need to introduce the 
language Lg c ((?i, (72) := { w £ A* \ there is a path from the node q\ to qi in Gc labeled w}. 
Since description graphs can be viewed as finite automata, such a language will be regular. 
Moreover, let a A* denote the set {aw \ w E A*} for an attribute a € A, where A is a finite 
alphabet. 

Corollary 7 G^ contains an infinite number of same-as nodes iff either 

(i) there exist nodes (hi,po), (/i2,Po) in G as well as nodes f, e\, ei in Gc, and attributes 

a,b E A such that 

1. hi / h 2 , ei / e 2 ; 

2. po does not have a b- successor in Gd; 

3. (e\,a,f), (e2,a,f) are edges in Gc; and 
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4- Lc; c (hi, ei) H Lg c (/j2, £2) H bA* is an infinite set of words; 

or 

(ii) the same statement as (i) but with roles of C and D switched. 

Proof. We first prove the only-if direction. Assume that G£, contains an infinite number 
of same-as nodes. Then, w.l.o.g., we find the configuration in G^, described in the proof 
of Theorem 6. This configuration satisfies the conditions 1. and 3. stated in the corollary. 
If, for i / j, the words xi and Xj coincide, we can conclude ni = rij because G£, is a 
deterministic graph. However, by definition, rij 7^ nj. Hence, xi 7^ Xj. Because A is finite, 
we can, w.l.o.g., assume that all x^s have b € A as their first letter for some fixed b. Thus, 
condition 4. is satisfied as well. According to the configuration, the 6-successor of (-,po) in 
G^ is of the form (-,p' ) where p' is a tree node. Thus, po does not have a 6-successor in 
Gp, which means that condition 3. is satisfied. 

We now prove the if direction of the corollary. For this purpose, let bx € LQ c (hi, e\) D 
LG c (h,2,e2) fl bA*. Since po has no 6-successor in Gn it follows that there are tree nodes 
t, t' in G^ such that pobx-t-at' G Gg\ Thus, we have (hi,po)bx-(ei,t)-a(f,t') € G£, and 
(h2,Po)bx-(e2,t)-a(f,t') € G^. Since e\ / e2, we can conclude (e\,t) / (e2,t). This means 
that (/, t') is a same-as node. Analogously, for by G LG c {h\,e\) fl LG c {fi2,e2) D bA* there 
are tree nodes in G^ such that poby-s-as' € Gg 1 and (/,«') is a same-as node in G^. 
Since bx and 6y both start with 6, and the 6-successor of po in Gg 1 is a tree node, x 7^ y 
implies s' 7^ t'. Hence, (/, t') and (/, s') are distinct same-as nodes. This shows that if the 
set Lg c (hi, ei) fl Lg c (/i2 5 ^2) H is infinite, must have an infinite number of same-as 
nodes. □ 

For given nodes (h\,po), (/i2,Po) in G, attributes a,b (E A, nodes /, ei,e2 € Gc the condi- 
tions 1. to 3. in Corollary 7 can obviously be checked in time polynomial in the size of the 
concept descriptions G and D. As for the last condition, note that an automaton accepting 
the language Lg c (/ii, e\) fl Lg c (^2 5 £2) H 6^4* can be constructed in time polynomial in the 
size of G. Furthermore, for a given finite automaton it is decidable in time polynomial in 
the size of the automaton if it accepts an infinite language (see the book by Hopcroft and 
Ullman (1979) for details). Thus, condition 4. can be tested in time polynomial in the size 
of G and D as well. Finally, since the size of G and Gc is polynomial in the size of G and D, 
only a polynomial number of configurations need to be tested. Together with Corollary 7 
these complexities provide us with the following corollary. 

Corollary 8 For given concept descriptions C and D inS it is decidable in time polynomial 
in the size of C and D whether les of C and D exists in S. 

5.4 Computing the LCS 

In this subsection, we first show that the size of an lcs of two 5-concept descriptions may 
grow exponentially in the size of the concept descriptions. This is a stronger result than 
that presented for partial attributes, where it was only shown that the lcs of a sequence of 
concept descriptions in S can grow exponentially. Then, we present an exponential time lcs 
algorithm for 5-concept descriptions. 
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Figure 8: The canonical description graphs for C and 



In order to show that the lcs may be of exponential size, we consider the following 
example, where A := {a, 6, c, d}.We define 

C := alb, 

k k 
D k := n ac l I ad 1 l~l fl be 1 4 bd % l~l ac k a 4 bc k a. 

i—l i—1 

The corresponding canonical description graphs Gc 1 and Go k are depicted in Figure 8. 

A finite graph representing the lcs of C and is depicted in Figure 9 for k = 2. 
This graph can easily be derived from Gg?, x G^. The graph comprises two binary trees 
of height k, and thus, it contains at least 2 k nodes. In the following, we will show that 
there is no canonical description graph Ge^ (with root no) representing the lcs of C 
and Dk with less than 2 k nodes. Let x € {c, d} k be a word of length k over {c, d}, and let 
v := axa, w := bxa. Using the canonical description graphs Gc 1 and Go k it is easy to see 
that C Qt v 4 w an d Qt v 4 w - Thus, Q t v 4 w. By Algorithm 2, this means that 
there are words v',w',u such that v = v'u, w = w'u, and there are paths from no labeled 
v' and w' in Ge^ leading to the same node in Ge^- Suppose u / e. Then, Algorithm 2 
implies E 1 ^ C t aa; 4 bx. But according to Gd, D % t ax 4 6a;. Therefore it must be the empty 



c 




Figure 9: A finite graph representing the lcs of C and Z?2 
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word e. This proves that in Ge u there is a path from no labeled axa for every x G {c, d} k . 
Hence, there is a path for every ax. Now, let y G {c, d} k be such that x ^ y. If the paths 
for ax and ay from no in lead to the same node, then this implies Qt o,x \. ay in 
contradiction to C %t ax 4 a V- As a result, ax and ay lead to different nodes in Ge u - Since 
{c, contains 2 k words, this shows that Ge u has at least 2 k nodes. Finally, taking into 
account that the size of a canonical description graph of a concept description in S is linear 
in the size of the corresponding description we obtain the following theorem. 

Theorem 7 The lcs of two S-concept descriptions may grow exponentially in the size of 
the concepts. 

The following (exponential time) algorithm computes the lcs of two 5-concept descriptions 
in case it exists. 

Algorithm 3 

Input: concept descriptions C, D in S, for which the lcs exists in S; 
Output: lcs of C and D in S; 

1. Compute G' := Gc x Gb; 

2. For every combination 

• of nodes (hi,p ), (h 2 ,po) in G = Gc x Go, hi ^ h 2 ; 

• a £ A, ei,e2,/ in Gc, ei 7^62, where (e\,a,f) and (e2,a, /) are edges in Gc 
extend G' as follows: Let Gh 1 ,t, Gh 2 ,t be two trees representing the (finite) set of words 



in 



b^SUCC(p ) 



where succ(po) := {b \ po has a b-successor} and the set of nodes of Gh lt t, Gh 2 ,t, an d 
G' are assumed to be disjoint. Now, replace the root of G^j by {h\,po), the root of 
Gh 2 ,t by (fi2,po), and extend G' by the nodes and edges of these two trees. Finally, 
add a new node n v for every word v in L, and for each node of the trees Gh 1: t and 
Gh 2 ,t reachable from the root of Gh lt t and Gh 2 ,t by a path labeled v, add an edge with 
label a from it to n v . The extension is illustrated in Figure 10. 

3. The same as in step 2, with roles of C and D switched. 

4- Compute the canonical graph of G' , which is called G' again. Then, output the concept 
description Cg> of G 1 . 



Proposition 3 The translation Cq 1 of the graph G' computed by Algorithm 3 is the lcs E 
of C and D. 
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'"bad 



Figure 10: The extension at the nodes {hi,po), {h-2,Po) in G' where L = {b,bc,bad} 



Proof. It is easy to see that if there are two paths in G' labeled y\ and j/2 leading from 
the root (nc,no) to the same node, then contains such paths as well. Consequently, 
{E= t )G^n t G'. 

Now, assume E Q t y\ \, yi, Ui / U2- By Proposition 2 we know that there are paths 
in G£j labeled y\ and yi leading to the same node n. W.l.o.g, we may assume that n is a 
same-as node in G^. Otherwise, there exist words yi',y2,u with y\ = yi'u, yi = y2U such 
that y\ and y^ lead to a same-as node. If we can show that G' contains paths labeled y\ 
and y2 leading to the same node, then, by Algorithm 2, this is sufficient for G' Qt Ui 4- 2/2- 
So let n be a same-as node. We distinguish two cases: 

1. If n is a node in G = Gc x Gd, then the paths for y\ and yi are paths in G. Since G 
is a subgraph of G' this holds for G' as well. Hence, Cq 1 Qt y\ J. yi- 

2. Assume n is not a node in G. Then, since n is a same-as node, we know that n is of the 
form (/, t) or (i, /) where / is a non-tree node and t is a tree node. By symmetry, we 
may assume that n = (/, t). Now it is easy to see that there exist nodes hi, /12, e\, in 
Gc, Po in Gp, and a tree node qo in G^ as well as a € A and x,v,w € A* as specified 
in Lemma 7 such that yi = vxa and y2 = ^sa. But then, with hi, hi, e\, e2,po, f and 
a the preconditions of Algorithm 3 are satisfied and x € L. Therefore, by construction 
of G' there are paths labeled yi and yi, respectively, leading from the root to the same 
node. □ 

We note that the product G of Gc and Gb can be computed in time polynomial in the 
size of C and D. Furthermore, there is only a polynomial number of combinations of nodes 
{hi,po), (h2,Po) in G, ei,e2,f in Gc, a € A. Finally, the finite automaton for L can be 
computed in time polynomial in the size of C and D. In particular, the set of states of this 
automaton can polynomially be bounded in the size of C and D. If L contained a word 
longer than the number of states, the accepting path in the automaton contains a cycle. But 
then, the automaton would accept infinitely many words, in contradiction to the assumption 
that L is finite. Thus, the length of all words in L can be bounded polynomially in the 
size of C and D. In particular, this means that L contains only an exponential number of 
words. Trees representing these words can be computed in time exponential in the size of 
C and D. 
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Corollary 9 If the lcs of two S-concept descriptions exists, then it can be computed in time 
exponential in the size of the concept descriptions. 

6. Conclusion 

Attributes — binary relations that can have at most one value - have been distinguished 
in many knowledge representation schemes and other object-centered modeling languages. 
This had been done to facilitate modeling and, in description logics, to help identify tractable 
sets of concept constructors (e.g., restricting same-as to attributes). In fact, same-as restric- 
tions are quite important from a practical point of view, because they support the modeling 
of actions and their components (Borgida &: Devanbu, 1999). 

A second distinction, between attributes as total versus partial functions, had not been 
considered so essential until now. This paper has shown that this distinction can sometime 
have significant effects. 

In particular, we have first shown that the approach for computing subsumption of 
Classic concepts with total attributes, presented by Borgida and Patel- Schneider (1994), 
can be modified to accommodate partial attributes, by treating partial attributes as roles 
until they participate in same-as restrictions, in which case they are "converted" to to- 
tal attributes. As a result, we obtain polynomial-time algorithms for subsumption and 
consistency checking in this case also. 

In the case of computing least common subsumers, which was introduced as a technique 
for learning non-propositional descriptions of concepts, we first noted that several of the 
papers in the literature (Cohen &: Hirsh, 1994a; Frazier &: Pitt, 1996) (implicitly) used 
partial attributes, when considering Classic. Furthermore, these papers used a weaker 
version of the "concept graphs" employed in (Borgida &: Patel-Schneider, 1994), which 
make the results only hold for the case of same-as restrictions that do not generate "cycles" . 
Furthermore, the algorithm proposed by Frazier and Pitt (1996) does not handle inconsistent 
concepts, which can easily arise in Classic concepts as a result of conflicts between lower 
and upper bounds of roles. 

Therefore, we have provided an lcs algorithm together with a formal proof of correctness 
for a sublanguage of Classic with partial attributes, which allows for same-as equalities 
and inconsistent concepts — the algorithm and proofs can easily be extended to full Classic 
(Kusters &: Borgida, 1999). In this case, the lcs always exists, and it can be computed in 
time polynomial in the size of the two initial concept descriptions. As shown by Cohen et al. 
(1992), there are sequences of concept descriptions for which the lcs may grow exponentially 
in the size of the sequence. 

To complete the picture, and as the main part of the paper, we then examined the 
question of computing lcs in the case of total attributes. Surprisingly, the situation here 
is very different from the partial attribute case (unlike with subsumption). First, for the 
language S the lcs may not even exist. (The existence of the lcs mentioned by Cohen et al. 
(1992) is due to an inadvertent switch to partial semantics for attributes.) Nevertheless, 
the existence of the lcs of two concept descriptions can be decided in polynomial time. But 
if the lcs exists, it may grow exponentially in the size of the concept descriptions, and hence 
the computation of the lcs may take time exponential in the size of the two given concept 
descriptions. 



200 



What's in an Attribute? 



As an aside, we note that it has been pointed out by Cohen et al. (1992) that concept 
descriptions in S correspond to a finitely generated right congruence. Furthermore, in this 
context the lcs of two concept descriptions is the intersection of right congruences. Thus, 
the results presented in this paper also show that the intersection of finitely generated 
right congruences is not always a finitely generated right congruence, and that there is a 
polynomial algorithm for deciding this question. Finally, if the intersection can be finitely 
generated, then the generating system may be exponential and can be computed with 
an exponential time algorithm in the size of the generating systems of the given right 
congruences. 

The results in this paper therefore lay out the scope of the effect of making attributes 
be total or partial functions in a description logic that supports the same-as constructor. 
Moreover, we correct some problems and extend results in the previous literature. 

We believe that the disparity between the results in the two cases should serve as a 
warning to other researchers in knowledge representation and reasoning, concerning the 
importance of explicitly considering the difference between total and partial attributes. 
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